Assessing sequence comparison methods with the average precision criterion
نویسنده
چکیده
MOTIVATION Comprehensive performance assessment is important for improving sequence database search methods. Sensitivity, selectivity and speed are three major yet usually conflicting evaluation criteria. The average precision (AP) measure aims to combine the sensitivity and selectivity features of a search algorithm. It can be easily visualized and extended to analyze results from a set of queries. Finally, the time-AP plot can clearly show the overall performance of different search methods. RESULTS Experiments are performed based on the SCOP database. Popular sequence comparison algorithms, namely Smith-Waterman (SSEARCH), FASTA, BLAST and PSI-BLAST are evaluated. We find that (1) the low-complexity segment filtration procedure in BLAST actually harms its overall search quality; (2) AP scores of different search methods are approximately in proportion of the logarithm of search time; and (3) homologs in protein families with many members tend to be more obscure than those in small families. This measure may be helpful for developing new search algorithms and can guide researchers in selecting most suitable search methods. AVAILABILITY Test sets and source code of this evaluation tool are available upon request.
منابع مشابه
A Comparison of Thin Plate and Spherical Splines with Multiple Regression
Thin plate and spherical splines are nonparametric methods suitable for spatial data analysis. Thin plate splines acquire efficient practical and high precision solutions in spatial interpolations. Two components in the model fitting is considered: spatial deviations of data and the model roughness. On the other hand, in parametric regression, the relationship between explanatory and response v...
متن کاملEvaluation of Updating Methods in Building Blocks Dataset
With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...
متن کاملEvaluation of Accuracy, Precision and Agreement of Five Hba1c Measurement Methods with HPLC Reference Method
ABSTRACT Background and Objective: The current challenge of diabetes mellitus is to prevent its complications. These complications are directly associated with hyperglycemia in diabetics. The HbA1c measurement is essential for long-term glycemic control. Synchronization of HbA1c measurement is important in order to avoid discrepancies between resu...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملComparison of ISSR and AFLP markers in assessing genetic diversity among Nettle (Urtica dioica L.) populations.
Urtica dioica is an important medicinal plant which is widely distributed in Mazandaran province (North of Iran). In this study for the first time Amplified Fragment Length Polymorphism (AFLP) and Inter-simple Sequence Repeat (ISSR) markers were used for detection of genetic polymorphism in Mazandaran nettle. Ten AFLP primer combinations and seventeen ISSR markers were utilized. AFLP produced 830...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 18 شماره
صفحات -
تاریخ انتشار 2003